Algorithms for Differentially Private Multi-Armed Bandits

نویسندگان

  • Aristide C. Y. Tossou
  • Christos Dimitrakakis
چکیده

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ǫ, δ) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ǫ + log T ). This is a significant improvement over previous results, which only achieve poly-log regret O(ǫ log T ), because of our use of a novel intervalbased mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Price of Differential Privacy for Online Learning

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal Õ( √ T ) regret bounds. In the full-information setting, our results demonstrate that ε-differential privacy may be ensured for free – in particular, the regret bounds scale as O( √ T ) + Õ ( 1 ε ) . For bandit linear optimization, and as a special c...

متن کامل

(Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits

We study the problem of private stochastic multiarm bandits. Our notion of privacy is the same as some of the earlier works in the general area of private online learning [13, 17, 24]. We design algorithms that are i) differentially private, and ii) have regret guarantees that (almost) match the regret guarantees for the best non-private algorithms (e.g., upper confidence bound sampling and Tho...

متن کامل

The Price of Differential Privacy for Online Learning(with Supplementary Material)

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal Õ( √ T )1 regret bounds. In the full-information setting, our results demonstrate that ε-differential privacy may be ensured for free – in particular, the regret bounds scale as O( √ T ) + Õ ( 1 ε ) . For bandit linear optimization, and as a special ...

متن کامل

Private Stochastic Multi-arm Bandits: From Theory to Practice

In this paper we study the problem of private stochastic multi-arm bandits. Our notion of privacy is the same as some of the earlier works in the general area of private online learning (Dwork et al., 2010; Jain et al., 2012; Smith and Thakurta, 2013). We design algorithms that are i) differentially private, and ii) have regret guarantees that (almost) match the regret guarantees for the best n...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016